CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs). However, the inconsistency between the mixed images and the corresponding labels harms its efficacy. Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels, but inevitably introduce heavy training overhead or require extra information, undermining ease of use. To this end, we propose an efficient and effective Self-Motivated image Mixing method (SMMix), which motivates both image and label enhancement by the model under training itself. Specifically, we propose a max-min attention region mixing approach that enriches the attention-focused objects in the mixed images. Then, we introduce a fine-grained label assignment technique that co-trains the output tokens of mixed images with fine-grained supervision. Moreover, we devise a novel feature consistency constraint to align features from mixed and unmixed images. Due to the subtle designs of the self-motivated paradigm, our SMMix is significant in its smaller training overhead and better performance than other CutMix variants. In particular, SMMix improves the accuracy of DeiT-T/S, CaiT-XXS-24/36, and PVT-T/S/M/L by more than +1% on ImageNet-1k. The generalization capability of our method is also demonstrated on downstream tasks and out-of-distribution datasets. Code of this project is available at https://github.com/ChenMnZ/SMMix.
translated by 谷歌翻译
As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.
translated by 谷歌翻译
由于空间和时间变化的模糊,视频脱毛是一个高度不足的问题。视频脱毛的直观方法包括两个步骤:a)检测当前框架中的模糊区域; b)利用来自相邻帧中清晰区域的信息,以使当前框架脱毛。为了实现这一过程,我们的想法是检测每个帧的像素模糊级别,并将其与视频Deblurring结合使用。为此,我们提出了一个新颖的框架,该框架利用了先验运动级(MMP)作为有效的深视频脱张的指南。具体而言,由于在曝光时间内沿其轨迹的像素运动与运动模糊水平呈正相关,因此我们首先使用高频尖锐框架的光流量的平均幅度来生成合成模糊框架及其相应的像素 - 像素 - 明智的运动幅度地图。然后,我们构建一个数据集,包括模糊框架和MMP对。然后,由紧凑的CNN通过回归来学习MMP。 MMP包括空间和时间模糊级别的信息,可以将其进一步集成到视频脱毛的有效复发性神经网络(RNN)中。我们进行密集的实验,以验证公共数据集中提出的方法的有效性。
translated by 谷歌翻译
我们研究了从单个运动毛发图像中恢复详细运动的挑战性问题。该问题的现有解决方案估算一个单个图像序列,而无需考虑每个区域的运动歧义。因此,结果倾向于收敛到多模式可能性的平均值。在本文中,我们明确说明了这种运动歧义,使我们能够详细地生成多个合理的解决方案。关键思想是引入运动引导表示,这是对仅有四个离散运动方向的2D光流的紧凑量量化。在运动引导的条件下,模糊分解通过使用新型的两阶段分解网络导致了特定的,明确的解决方案。我们提出了一个模糊分解的统一框架,该框架支持各种界面来生成我们的运动指导,包括人类输入,来自相邻视频帧的运动信息以及从视频数据集中学习。关于合成数据集和现实世界数据的广泛实验表明,所提出的框架在定性和定量上优于以前的方法,并且还具有生产物理上合理和多样的解决方案的优点。代码可从https://github.com/zzh-tech/animation-from-blur获得。
translated by 谷歌翻译
通过强迫连续重量的最多n非零,最近的N:M网络稀疏性因其两个有吸引力的优势而受到越来越多的关注:1)高稀疏性的有希望的表现。 2)对NVIDIA A100 GPU的显着加速。最近的研究需要昂贵的训练阶段或重型梯度计算。在本文中,我们表明N:M学习可以自然地将其描述为一个组合问题,该问题可以在有限的集合中寻找最佳组合候选者。由这种特征激励,我们以有效的分裂方式解决了n:m的稀疏性。首先,我们将重量向量分为$ c _ {\ text {m}}}^{\ text {n}} $组合s子集的固定大小N。然后,我们通过分配每个组合来征服组合问题,一个可学习的分数是共同优化了其关联权重。我们证明,引入的评分机制可以很好地模拟组合子集之间的相对重要性。通过逐渐去除低得分的子集,可以在正常训练阶段有效地优化N:M细粒稀疏性。全面的实验表明,我们的学习最佳组合(LBC)的表现始终如一,始终如一地比现成的N:m稀疏方法更好。我们的代码在\ url {https://github.com/zyxxmu/lbc}上发布。
translated by 谷歌翻译
滚动快门(RS)失真可以解释为在RS摄像机曝光期间,随着时间的推移从瞬时全局快门(GS)框架中挑选一排像素。这意味着每个即时GS帧的信息部分,依次是嵌入到行依赖性失真中。受到这一事实的启发,我们解决了扭转这一过程的挑战性任务,即从rs失真中的图像中提取未变形的GS框架。但是,由于RS失真与其他因素相结合,例如读数设置以及场景元素与相机的相对速度,因此仅利用临时相邻图像之间的几何相关性的型号,在处理数据中,具有不同的读数设置和动态场景的数据中遭受了不良的通用性。带有相机运动和物体运动。在本文中,我们建议使用双重RS摄像机捕获的一对图像,而不是连续的框架,而RS摄像机则具有相反的RS方向,以完成这项极具挑战性的任务。基于双重反转失真的对称和互补性,我们开发了一种新型的端到端模型,即IFED,以通过卢比时间对速度场的迭代学习来生成双重光流序列。广泛的实验结果表明,IFED优于天真的级联方案,以及利用相邻RS图像的最新艺术品。最重要的是,尽管它在合成数据集上进行了训练,但显示出在从现实世界中的RS扭曲的动态场景图像中检索GS框架序列有效。代码可在https://github.com/zzh-tech/dual-versed-rs上找到。
translated by 谷歌翻译
虽然RGB-Infrared跨型号人重新识别(RGB-IR Reid)在24小时智能监测中启用了巨大进展,但最先进的仍然严重依赖于微调想象的预先训练的网络。由于单模性质,这种大规模的预训练可以产生逆向模态图像检索性能的RGB偏置的表示。本文介绍了一个自我监督的预训练替代品,命名为模态感知多个粒度学习(MMGL),该学习(MMGL)直接从划痕上培训模型,而是在没有外部数据和复杂的调整技巧的情况下实现竞争结果。具体而言,MMGL将RGB-IR图像映射到共享潜在置换空间中,通过最大化循环 - 一致的RGB-IR图像补片之间的协议,进一步提高了局部辨别性。实验表明,MMGL在更快的训练速度(几小时内收敛)和求解数据效率(<5%数据大小)比想象预先训练更好地了解更好的表示(+ 6.47%的秩1)。结果还表明它概括为各种现有模型,损失,并且在数据集中具有有希望的可转换性。代码将被释放。
translated by 谷歌翻译
Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM.
translated by 谷歌翻译
Solar activity is usually caused by the evolution of solar magnetic fields. Magnetic field parameters derived from photospheric vector magnetograms of solar active regions have been used to analyze and forecast eruptive events such as solar flares and coronal mass ejections. Unfortunately, the most recent solar cycle 24 was relatively weak with few large flares, though it is the only solar cycle in which consistent time-sequence vector magnetograms have been available through the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) since its launch in 2010. In this paper, we look into another major instrument, namely the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO) from 1996 to 2010. The data archive of SOHO/MDI covers more active solar cycle 23 with many large flares. However, SOHO/MDI data only has line-of-sight (LOS) magnetograms. We propose a new deep learning method, named MagNet, to learn from combined LOS magnetograms, Bx and By taken by SDO/HMI along with H-alpha observations collected by the Big Bear Solar Observatory (BBSO), and to generate vector components Bx' and By', which would form vector magnetograms with observed LOS data. In this way, we can expand the availability of vector magnetograms to the period from 1996 to present. Experimental results demonstrate the good performance of the proposed method. To our knowledge, this is the first time that deep learning has been used to generate photospheric vector magnetograms of solar active regions for SOHO/MDI using SDO/HMI and H-alpha data.
translated by 谷歌翻译
本文旨在探讨如何合成对其进行训练的现有视频脱毛模型的近距离模糊,可以很好地推广到现实世界中的模糊视频。近年来,基于深度学习的方法已在视频Deblurring任务上取得了希望的成功。但是,对现有合成数据集培训的模型仍然遭受了与现实世界中的模糊场景的概括问题。造成故障的因素仍然未知。因此,我们重新审视经典的模糊综合管道,并找出可能的原因,包括拍摄参数,模糊形成空间和图像信号处理器〜(ISP)。为了分析这些潜在因素的效果,我们首先收集一个超高帧速率(940 fps)原始视频数据集作为数据基础,以综合各种模糊。然后,我们提出了一种新颖的现实模糊合成管道,该管道通过利用模糊形成线索称为原始爆炸。通过大量实验,我们证明了在原始空间中的合成模糊并采用与现实世界测试数据相同的ISP可以有效消除合成数据的负面影响。此外,合成的模糊视频的拍摄参数,例如,曝光时间和框架速率在改善脱毛模型的性能中起着重要作用。令人印象深刻的是,与在现有合成模糊数据集中训练的训练的模型合成的模糊数据训练的模型可以获得超过5DB PSNR的增益。我们认为,新颖的现实合成管道和相应的原始视频数据集可以帮助社区轻松构建自定义的Blur数据集,以改善现实世界的视频DeBlurring性能,而不是费力地收集真实的数据对。
translated by 谷歌翻译